Text Summarization Using Transformers

TL;DR

Dual-mode summarisation system: BART (facebook/bart-large-cnn) for abstractive output and MiniLM + cosine similarity for extractive. Handles PDF and .txt uploads. Chunking strategy implemented to manage BART's 1024-token hard limit. ROUGE scores evaluated on CNN/DailyMail benchmark subset.

BART + MiniLM

PDF & .txt Support

Chunking for 1024-token limit

ROUGE Evaluated

Python NLP Transformers BART MiniLM Abstractive Summarization Extractive Summarization HuggingFace

Project Overview

This project implements both abstractive and extractive summarisation using state-of-the-art transformer models. The dual-mode approach serves different use cases: abstractive summarisation (via BART) generates fluent, re-worded summaries ideal for content digests and news articles; extractive summarisation (via MiniLM + cosine similarity) selects the most semantically relevant sentences from the original text, preserving factual accuracy for legal and technical documents.

The system accepts PDF and .txt file uploads alongside direct text input. A chunking strategy handles documents that exceed BART's 1024-token context window — long documents are split at sentence boundaries, summarised per chunk, then the chunk summaries are merged and re-summarised to produce the final output. Latency was measured across document lengths to characterise the performance envelope.

ROUGE Evaluation

Evaluated on a 500-sample subset of the CNN/DailyMail dataset:

Method	ROUGE-1	ROUGE-2	ROUGE-L
BART Abstractive	0.44	0.21	0.40
MiniLM Extractive	0.38	0.15	0.34

BART's higher ROUGE scores reflect its ability to paraphrase and restructure content. MiniLM's extractive approach scores lower on ROUGE but produces factually safer output — preferred for legal and technical text where paraphrasing introduces risk.

Key Insights

Dual-mode approach is architecturally necessary — no single summarisation method works best across all document types and use cases.
Chunking at sentence boundaries (not character count) preserves semantic coherence within each chunk, producing better intermediate summaries than arbitrary truncation.
Cosine similarity threshold for extractive selection is tunable — lower thresholds produce longer, more complete summaries; higher thresholds produce tighter, more focused ones.
Latency scales non-linearly with document length for BART (each chunk is a separate model call), making extractive faster for very long documents.

🔧 Technical Challenges & Solutions

BART's 1024-token hard limit: Long documents silently truncated with naive usage. Solved by implementing sentence-boundary chunking — document split into ~800-token chunks (buffer for special tokens), each summarised independently, then chunk summaries merged and re-summarised.

Chunk boundary semantic breaks: Splitting at fixed token counts cut sentences mid-thought. Solved by using NLTK sentence tokenisation to identify clean split points before applying token counting.

MiniLM sentence scoring: Extractive model selected topically redundant sentences (different wording, same information). Added a post-selection diversity filter using pairwise cosine similarity to remove near-duplicate selections.

Technical Implementation

Abstractive Pipeline (BART):
- facebook/bart-large-cnn from HuggingFace Transformers.
- Input → NLTK sentence tokenisation → chunk splitting at ~800 tokens → per-chunk summarisation → merge → final summarisation.
- Configurable max_length and min_length for output control.
Extractive Pipeline (MiniLM):
- sentence-transformers/all-MiniLM-L6-v2 encodes all sentences into 384-dim embeddings.
- Cosine similarity computed between each sentence and the full document centroid vector.
- Top-N sentences selected by score, then filtered for diversity (pairwise sim < 0.85 threshold).
Interface & File Handling:
- Streamlit frontend for interactive text and file upload.
- Flask backend exposing summarisation as a REST API for programmatic access.
- PDF extraction via PyMuPDF; .txt handled with encoding detection.

Video Preview

Key Learnings

Token limits are an engineering problem, not a model limitation — BART's 1024-token limit is not a reason to avoid it on long documents; it's a prompt engineering and chunking challenge to solve.
ROUGE is a necessary but insufficient evaluation metric — a summary can score well on ROUGE while being factually incorrect (hallucinated details). Qualitative review of extractive vs abstractive outputs on the same document reveals differences that ROUGE cannot capture.
Sentence embeddings are powerful for ranking relevance — MiniLM's 384-dim semantic space encodes meaning effectively enough that cosine similarity alone produces useful extractive rankings without any fine-tuning.
API design matters for NLP tools — exposing summarisation as a REST endpoint (not just a UI) makes it composable into larger pipelines (e.g., a document processing workflow that summarises before indexing).

Future Work

Fine-tune BART on domain-specific data (legal, medical) rather than relying on the CNN/DailyMail pre-trained weights — generic news training limits performance on technical documents.
Add a faithfulness evaluation step using NLI (Natural Language Inference) to detect hallucinated facts in abstractive summaries.
Implement async processing for long documents — chunked BART inference is slow synchronously; queuing with Celery + Redis would improve user experience significantly.

GitHub

Built by Om Patel — ML Engineer & Data Scientist.
Explore more projects on my Portfolio.